8 resultados para Classification and Regression Trees

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bladder cancer is the fourth most common cancer in men in the United States. There is compelling evidence supporting that genetic variations contribute to the risk and outcomes of bladder cancer. The PI3K-AKT-mTOR pathway is a major cellular pathway involved in proliferation, invasion, inflammation, tumorigenesis, and drug response. Somatic aberrations of PI3K-AKT-mTOR pathway are frequent events in several cancers including bladder cancer; however, no studies have investigated the role of germline genetic variations in this pathway in bladder cancer. In this project, we used a large case control study to evaluate the associations of a comprehensive catalogue of SNPs in this pathway with bladder cancer risk and outcomes. Three SNPs in RAPTOR were significantly associated with susceptibility: rs11653499 (OR: 1.79, 95%CI: 1.24–2.60), rs7211818 (OR: 2.13, 95%CI: 1.35–3.36), and rs7212142 (OR: 1.57, 95%CI: 1.19–2.07). Two haplotypes constructed from these 3 SNPs were also associated with bladder cancer risk. In combined analysis, a significant trend was observed for increased risk with an increase in the number of unfavorable genotypes (P for trend<0.001). Classification and regression tree analysis identified potential gene-environment interactions between RPS6KA5 rs11653499 and smoking. In superficial bladder cancer, we found that PTEN rs1234219 and rs11202600, TSC1 rs7040593, RAPTOR rs901065, and PIK3R1 rs251404 were significantly associated with recurrence in patients receiving BCG. In muscle invasive and metastatic bladder cancer, AKT2 rs3730050, PIK3R1 rs10515074, and RAPTOR rs9906827 were associated with survival. Survival tree analysis revealed potential gene-gene interactions: patients carrying the unfavorable genotypes of PTEN rs1234219 and TSC1 rs704059 exhibited a 5.24-fold (95% CI: 2.44–11.24) increased risk of recurrence. In combined analysis, with the increasing number of unfavorable genotypes, there was a significant trend of higher risk of recurrence and death (P for trend<0.001) in Cox proportional hazard regression analysis, and shorter event (recurrence and death) free survival in Kaplan-Meier estimates (P log rank<0.001). This study strongly suggests that genetic variations in PI3K-AKT-mTOR pathway play an important role in bladder cancer development. The identified SNPs, if validated in further studies, may become valuable biomarkers in assessing an individual's cancer risk, predicting prognosis and treatment response, and facilitating physicians to make individualized treatment decisions. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Environmental data sets of pollutant concentrations in air, water, and soil frequently include unquantified sample values reported only as being below the analytical method detection limit. These values, referred to as censored values, should be considered in the estimation of distribution parameters as each represents some value of pollutant concentration between zero and the detection limit. Most of the currently accepted methods for estimating the population parameters of environmental data sets containing censored values rely upon the assumption of an underlying normal (or transformed normal) distribution. This assumption can result in unacceptable levels of error in parameter estimation due to the unbounded left tail of the normal distribution. With the beta distribution, which is bounded by the same range of a distribution of concentrations, $\rm\lbrack0\le x\le1\rbrack,$ parameter estimation errors resulting from improper distribution bounds are avoided. This work developed a method that uses the beta distribution to estimate population parameters from censored environmental data sets and evaluated its performance in comparison to currently accepted methods that rely upon an underlying normal (or transformed normal) distribution. Data sets were generated assuming typical values encountered in environmental pollutant evaluation for mean, standard deviation, and number of variates. For each set of model values, data sets were generated assuming that the data was distributed either normally, lognormally, or according to a beta distribution. For varying levels of censoring, two established methods of parameter estimation, regression on normal ordered statistics, and regression on lognormal ordered statistics, were used to estimate the known mean and standard deviation of each data set. The method developed for this study, employing a beta distribution assumption, was also used to estimate parameters and the relative accuracy of all three methods were compared. For data sets of all three distribution types, and for censoring levels up to 50%, the performance of the new method equaled, if not exceeded, the performance of the two established methods. Because of its robustness in parameter estimation regardless of distribution type or censoring level, the method employing the beta distribution should be considered for full development in estimating parameters for censored environmental data sets. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Historically morphological features were used as the primary means to classify organisms. However, the age of molecular genetics has allowed us to approach this field from the perspective of the organism's genetic code. Early work used highly conserved sequences, such as ribosomal RNA. The increasing number of complete genomes in the public data repositories provides the opportunity to look not only at a single gene, but at organisms' entire parts list. ^ Here the Sequence Comparison Index (SCI) and the Organism Comparison Index (OCI), algorithms and methods to compare proteins and proteomes, are presented. The complete proteomes of 104 sequenced organisms were compared. Over 280 million full Smith-Waterman alignments were performed on sequence pairs which had a reasonable expectation of being related. From these alignments a whole proteome phylogenetic tree was constructed. This method was also used to compare the small subunit (SSU) rRNA from each organism and a tree constructed from these results. The SSU rRNA tree by the SCI/OCI method looks very much like accepted SSU rRNA trees from sources such as the Ribosomal Database Project, thus validating the method. The SCI/OCI proteome tree showed a number of small but significant differences when compared to the SSU rRNA tree and proteome trees constructed by other methods. Horizontal gene transfer does not appear to affect the SCI/OCI trees until the transferred genes make up a large portion of the proteome. ^ As part of this work, the Database of Related Local Alignments (DaRLA) was created and contains over 81 million rows of sequence alignment information. DaRLA, while primarily used to build the whole proteome trees, can also be applied shared gene content analysis, gene order analysis, and creating individual protein trees. ^ Finally, the standard BLAST method for analyzing shared gene content was compared to the SCI method using 4 spirochetes. The SCI system performed flawlessly, finding all proteins from one organism against itself and finding all the ribosomal proteins between organisms. The BLAST system missed some proteins from its respective organism and failed to detect small ribosomal proteins between organisms. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A census of 925 U.S. colleges and universities offering masters and doctorate degrees was conducted in order to study the number of elements of an environmental management system as defined by ISO 14001 possessed by small, medium and large institutions. A 30% response rate was received with 273 responses included in the final data analysis. Overall, the number of ISO 14001 elements implemented among the 273 institutions ranged from 0 to 16, with a median of 12. There was no significant association between the number of elements implemented among institutions and the size of the institution (p = 0.18; Kruskal-Wallis test) or among USEPA regions (p = 0.12; Kruskal-Wallis test). The proportion of U.S. colleges and universities that reported having implemented a structured, comprehensive environmental management system, defined by answering yes to all 16 elements, was 10% (95% C.I. 6.6%–14.1%); however 38% (95% C.I. 32.0%–43.8%) reported that they had implemented a structured, comprehensive environmental management system, while 30.0% (95% C.I. 24.7%–35.9%) are planning to implement a comprehensive environmental management system within the next five years. Stratified analyses were performed by institution size, Carnegie Classification and job title. ^ The Osnabruck model, and another under development by the South Carolina Sustainable Universities Initiative, are the only two environmental management system models that have been proposed specifically for colleges and universities, although several guides are now available. The Environmental Management System Implementation Model for U.S. Colleges and Universities developed is an adaptation of the ISO 14001 standard and USEPA recommendations and has been tailored to U.S. colleges and universities for use in streamlining the implementation process. In using this implementation model created for the U.S. research and academic setting, it is hoped that these highly specialized institutions will be provided with a clearer and more cost-effective path towards the implementation of an EMS and greater compliance with local, state and federal environmental legislation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross-sectional age and sex specific distributions of serum total cholesterol were described for 1091 children age 6-18 years, in The Woodlands, Texas. Associations of serum total cholesterol with five anthropometric measurements (weight, height, body mass index, arm circumference, and triceps skinfold thickness) were examined by correlation and regression analyses. Examination of serum total cholesterol distributions showed lower levels in boys than in girls for most of the age groups studied. Mean levels of total cholesterol peaked at age 9 for boys and 8 for girls. Serum total cholesterol leveled off until age 14 for boys and 11 for girls, and then dropped through age 18 for both boys and girls. These results support the hypothesis that serum total cholesterol concentration drops at pre-adolescence.^ Age adjusted correlations were observed between serum total cholesterol and triceps skinfold thickness for both boys and girls. This association was stronger in boys. Triceps skinfold thickness and arm circumference were consistently the strongest correlates for serum total cholesterol in boys. Weight and arm circumference were consistently the strongest correlates for serum total cholesterol in girls. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study explored the relationship of attitudes, needs, and health services utilization patterns of elderly veterans who were identified and categorized by their expectation for and receipt of sick-role legitimation. Three prescription types (new, change, renewal) were defined as the operational variables. A population of 676 ambulatory, chronically ill (average age 60 years) veterans were sent a questionnaire (74% response rate). In addition, retrospective medical and prescription record review was performed for a 45% sample of respondents. The results were analyzed using discriminant function and regression analysis. Fewer than 20% of the veterans responding expected to receive more prescriptions than were presently prescribed, whereas over 80% expected refill authorizations. Distinct attitudinal, need, and utilization patterns were identified. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The interplay between obesity, physical activity, weight gain and genetic variants in mTOR pathway have not been studied in renal cell carcinoma (RCC). We examined the associations between obesity, weight gain, physical activity and RCC risk. We also analyzed whether genetic variants in the mTOR pathway could modify the association. Incident renal cell carcinoma cases and healthy controls were recruited from the University of Texas MD Anderson Cancer Center in Houston, Texas. Cases and controls were frequency-matched by age (±5 years), ethnicity, sex, and county of residence. Epidemiologic data were collected via in-person interview. A total of 577 cases and 593 healthy controls (all white) were included. One hundred ninety-two (192) SNPs from 22 genes were available and their genotyping data were extracted from previous genome-wide association studies. Logistic regression and regression spline were performed to obtain odds ratios. Obesity at age 20, 40, and 3 years prior to diagnosis/recruitment, and moderate and large weight gain from age 20 to 40 were each significantly associated with increased RCC risk. Low physical activity was associated with a 4.08-fold (95% CI: 2.92-5.70) increased risk. Five single nucleotide polymorphisms (SNPs) were significantly associated with RCC risk and their cumulative effect increased the risk by up to 72% (95% CI: 1.20-2.46). Strata specific effects for weight change and genotyping cumulative groups were observed. However, no interaction was suggested by our study. In conclusion, energy balance related risk factors and genetic variants in the mTOR pathway may jointly influence susceptibility to RCC. ^